Video encoding/decoding method and apparatus for motion compensation prediction

ABSTRACT

A video encoding method and apparatus to select one combination, for each block of an input video signal, from a plurality of combinations. Each combination includes a predictive parameter and at least one reference picture number determined in advance for the reference picture. A prediction picture signal is generated in accordance with the reference picture number and predictive parameter of the selected combination. A predictive error signal is generated representing an error between the input video signal and the prediction picture signal. Encoding the predictive error signal, information of the motion vector, and index information indicating the selected combination is included.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a divisional application of and claims the benefit of priorityunder 35 U.S.C. §120 from U.S. application Ser. No. 12/791,018, filedJun. 1, 2010, the entirety of which is herein incorporated by reference.U.S. application Ser. No. 12/791,018 is a divisional application of U.S.application Ser. No. 12/694,320, filed on Jan. 27, 2010, which is adivisional application of U.S. application Ser. No. 12/635,738, filed onDec. 11, 2009, which is a divisional application of U.S. applicationSer. No. 12/577,437, filed on Oct. 12, 2009, which is a divisionalapplication of U.S. application Ser. No. 12/323,930, filed on Nov. 26,2008, which is a divisional application of U.S. application Ser. No.11/687,923, filed Mar. 19, 2007, which is a divisional application ofU.S. application Ser. No. 10/754,535, filed on Jan. 12, 2004, which is acontinuation application of International Application No.PCT/JP03/04992, filed Apr. 18, 2003, which was not published under PCTArticle 21(2) in English.

This application is based upon and claims the benefit of priority fromthe prior Japanese Patent Applications No. 2002-116718, filed Apr. 18,2002; and No. 2002-340042, filed Nov. 22, 2002, the entire contents ofboth of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a video encoding/decoding method andapparatus which encode/decode a fade video and dissolving video, inparticular, at high efficiency.

2. Description of the Related Art

Motion compensation predictive inter-frame encoding is used as one ofencoding modes in a video encoding standard scheme such as ITU-TH.261,H.263, ISO/IEC MPEG-2, or MPEG-4. As a predictive model in motioncompensation predictive inter-frame encoding, a model that exhibits thehighest predictive efficiency when no change in brightness occurs in thetime direction is used. In the case of a fade video which changes in thebrightness of pictures, there is no method known up to now which makes aproper prediction against a change in the brightness of pictures when,for example, a normal picture fades in from a black picture. In order tomaintain picture quality in a fade video as well, therefore, a largenumber of bits are required.

In order to solve this problem, for example, in Japanese Patent No.3166716, “Fade Countermeasure Video Encoder and Encoding Method”, a fadevideo part is detected to change the allocation of the number of bits.More specifically, in the case of a fadeout video, a large number ofbits are allocated to the start part of fadeout that changes inluminance. In general, the last part of fadeout becomes a monochromepicture, and hence can be easily encoded. For this reason, the number ofbits allocated to this part is reduced. This makes it possible toimprove the overall picture quality without excessively increasing thetotal number of bits.

In Japanese Patent No. 2938412, “Video Luminance Change CompensationMethod, Video Encoding Apparatus, Video Decoding Apparatus, RecordingMedium on Which Video Encoding or Decoding Program Is Recorded, andRecording Medium on Which Encoded Data of Video Is Recorded”, there isproposed an encoding scheme of properly coping with a fade video bycompensating for a reference picture in accordance with two parameters,i.e., a luminance change amount and contrast change amount.

In Thomas Wiegand and Berand Girod, “Multi-frame motion-compensatedprediction for video transmission”, Kluwer Academic Publishers 2001, anencoding scheme based on a plurality of frame buffers is proposed. Inthis scheme, an attempt has been made to improve the predictiveefficiency by selectively generating a prediction picture from aplurality of reference frames held in the frame buffers.

According to the conventional techniques, in order to encode a fadevideo or dissolving video while maintaining high picture quality, alarge number of bits are required. Therefore, an improvement in encodingefficiency cannot be expected.

BRIEF SUMMARY OF THE INVENTION

It is an object of the present invention to provide a videoencoding/decoding method and apparatus which can encode a video whichchanges in luminance over time, e.g., a fade video or dissolving video,in particular, at high efficiency.

According to a first aspect of the present invention, there is provideda video encoding method of subjecting an input videos signal to motioncompensation predictive encoding by using a reference picture signalrepresenting at least one reference picture and a motion vector betweenthe input video signal and the reference picture signal, comprising:selecting one combination, for each block of the input video signal,from a plurality of combinations each including a predictive parameterand at least one reference picture number determined in advance for thereference picture; generating a prediction picture signal in accordancewith the reference picture number and predictive parameter of theselected combination; generating a predictive error signal representingan error between the input video signal and the prediction picturesignal; and encoding the predictive error signal, information of themotion vector, and index information indicating the selectedcombination.

According to a second aspect of the present invention, there is provideda video decoding method comprising: decoding encoded data including apredictive error signal representing an error in a prediction picturesignal with respect to a video signal, motion vector information, andindex information indicating a combination of at least one referencepicture number and a predictive parameter; generating a predictionpicture signal in accordance with the reference picture number andpredictive parameter of the combination indicated by the decoded indexinformation; and generating a reproduction video signal by using thepredictive error signal and the prediction picture signal.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

FIG. 1 is a block diagram showing the arrangement of a video encodingapparatus according to the first embodiment of the present invention;

FIG. 2 is a block diagram showing the detailed arrangement of a framememory/prediction picture generator in FIG. 1;

FIG. 3 is a view showing an example of a table of combinations ofreference frame numbers and predictive parameters, which is used in thefirst embodiment;

FIG. 4 is a flow chart showing an example of a sequence for selecting apredictive scheme (a combination of a reference frame number and apredictive parameter) for each macroblock and determining an encodingmode in the first embodiment;

FIG. 5 is a block diagram showing the arrangement of a video decodingapparatus according to the first embodiment;

FIG. 6 is a block diagram showing the detailed arrangement of the framememory/prediction picture generator in FIG. 5;

FIG. 7 is a view showing an example of a table of combinations ofpredictive parameters in a case wherein the number of reference framesis one and a reference frame number is sent as mode informationaccording to the second embodiment of the present invention;

FIG. 8 is a view showing an example of a table of combinations ofpredictive parameters in a case wherein the number of reference framesis two and a reference frame number is sent as mode informationaccording to the second embodiment;

FIG. 9 is a view showing an example of a table of combinations ofreference picture numbers and predictive parameters in a case whereinthe number of reference frame is one according to the third embodimentof the present invention;

FIG. 10 is a view showing an example of a table for only luminancesignals according to the third embodiment;

FIG. 11 is a view showing an example of a syntax for each block whenindex information is to be encoded;

FIG. 12 is a view showing a specific example of an encoded bit streamwhen a prediction picture is to be generated by using one referencepicture;

FIG. 13 is a view showing a specific example of an encoded bit streamwhen a prediction picture is to be generated by using two referencepictures;

FIG. 14 is a view showing an example of a table of reference framenumbers, reference field numbers, and predictive parameters wheninformation to be encoded is a top field according to the fourthembodiment of the present invention; and

FIG. 15 is a view showing an example of a table of reference framenumbers, reference field numbers, and predictive parameters wheninformation to be encoded is a bottom field according to the fourthembodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The embodiments of the present invention will be described below withreference to the several views of the accompanying drawing.

First Embodiment About Encoding Side

FIG. 1 shows the arrangement of a video encoding apparatus according tothe first embodiment of the present invention. A video signal 100 isinput to the video encoding apparatus, for example, on a frame basis.The video signal 100 is input to a subtracter 101. The subtracter 101calculates the difference between the video signal 100 and a predictionpicture signal 212 to generate a predictive error signal. A modeselection switch 102 selects either the predictive error signal or thevideo signal 100. An orthogonal transformer 103 subjects the selectedsignal to an orthogonal transformation, e.g., a discrete cosinetransform (DCT). The orthogonal transformer 103 generates orthogonaltransformation coefficient information, e.g., DCT coefficientinformation. The orthogonal transformation coefficient information isquantized by a quantizer 104 and branched into two paths. Onequantization orthogonal transformation coefficient information 210branched into two paths is guided to a variable-length encoder 111.

The other quantization orthogonal transformation coefficient information210 branched into the two paths is sequentially subjected to processingreverse to that in the quantizer 104 and orthogonal transformer 103 by adequantizer or inverse quantizer 105 and inverse orthogonal transformer106 to be reconstructed into a predictive error signal. Thereafter, anadder 107 adds the reconstructed predictive error signal to theprediction picture signal 212 input through a switch 109 to generate alocal decoded video signal 211. The local decoded video signal 211 isinput to a frame memory/prediction picture generator 108.

The frame memory/prediction picture generator 108 selects one of aplurality of combinations of prepared reference frame numbers andpredictive parameters. The linear sum of the video signal (local decodedvideo signal 211) of the reference frame indicated by the referenceframe number of the selected combination is calculated in accordancewith the predictive parameter of the selected combination, and theresultant signal is added to an offset based on the predictiveparameter. With this operation, in this case, a reference picture signalis generated on a frame basis. Subsequently, the frame memory/predictionpicture generator 108 motion-compensates for the reference picturesignal by using a motion vector to generate the prediction picturesignal 212.

In this process the frame memory/prediction picture generator 108generates motion vector information 214 and index information 215indicating a selected combination of a reference frame number and apredictive parameter, and sends information necessary for selection ofan encoding mode to a mode selector 110. The motion vector information214 and index information 215 are input to a variable-length encoder111. The frame memory/prediction picture generator 108 will be describedin detail later.

The mode selector 110 selects an encoding mode on a macroblock basis onthe basis of predictive information P from the frame memory/predictionpicture generator 108, i.e., selects either the intraframe encoding modeor the motion compensated predictive interframe encoding mode, andoutputs switch control signals M and S.

In the intraframe encoding mode, the switches 102 and 112 are switchedto the A side by the switch control signals M and S, and the input videosignal 100 is input to the orthogonal transformer 103. In the interframeencoding mode, the switches 102 and 112 are switched to the B side bythe switch control signals M and S. As a consequence, the predictiveerror signal from the subtracter 101 is input to the orthogonaltransformer 103, and the prediction picture signal 212 from the framememory/prediction picture generator 108 is input to the adder 107. Modeinformation 213 is output from the mode selector 110 and input to thevariable-length encoder 111.

The variable-length encoder 111 subjects the quantization orthogonaltransformation coefficient information 210, mode information 213, motionvector information 214, and index information 215 to variable-lengthencoding. The variable-length codes generated by this operation aremultiplexed by a multiplier 114. The resultant data is then smoothed byan output buffer 115. Encoded data 116 output from the output buffer 115is sent out to a transmission system or storage system (not shown).

An encoding controller 113 controls an encoding unit 112. Morespecifically, the encoding controller 113 monitors the buffer amount ofthe output buffer 115, and controls encoding parameters such as thequantization step size of the quantizer 104 to make the buffer amountconstant.

(About Frame Memory/Prediction Picture Generator 108)

FIG. 2 shows the detailed arrangement of the frame memory/predictionpicture generator 108 in FIG. 1. Referring to FIG. 2, the local decodedvideo signal 211 input from the adder 107 in FIG. 1 is stored in a framememory set 202 under the control of a memory controller 201. The framememory set 202 has a plurality of (N) frame memories FM1 to FMN fortemporarily holding the local decoded video signal 211 as a referenceframe.

In a predictive parameter controller 203 is prepared a plurality ofcombinations of reference frame numbers and predictive parameters inadvance as a table. The predictive parameter controller 203 selects, onthe basis of the video signal 100, a combination of the reference framenumber of a reference frame and a predictive parameter that is used togenerate the prediction picture signal 212, and outputs the indexinformation 215 indicating the selected combination.

A multi-frame motion evaluator 204 generates a reference picture signalin accordance with the combination of the reference frame number and theindex information selected by the predictive parameter controller 203.The multi-frame motion evaluator 204 evaluates the motion amount andpredictive error from this reference picture signal and input videosignal 100, and outputs the motion vector information 214 that minimizesthe predictive error. A multi-frame motion compensator 205 carries outmotion-compensation for each block using a reference picture signalselected by the multi-frame motion evaluator 204 in accordance with themotion vector to generate the prediction picture signal 212.

The memory controller 201 sets a reference frame number to a localdecoded video signal for each frame, and stores each frame in one of theframe memories FM1 to FMN of the frame memory set 202. For example, therespective frames are sequentially numbered from the frame nearest tothe input picture. The same reference frame number may be set fordifferent frames. In this case, for example, different predictiveparameters are used. A frame near to the input picture is selected fromthe frame memories FM1 to FMN and sent to the predictive parametercontroller 203.

(About Table of Combinations of Reference Frame Numbers and PredictionParameters)

FIG. 3 shows an example of the table of combinations of reference framenumbers and predictive parameters, which is prepared in the predictiveparameter controller 203. “Index” corresponds to prediction picturesthat can be selected for each block. In this case, there are eight typesof prediction pictures. A reference frame number n is the number of alocal decoded video used as a reference frame, and in this case,indicates the number of a local decoded video corresponding to n pastframes.

When the prediction picture signal 212 is generated by using the picturesignals of a plurality of reference frames stored in the frame memoryset 202, a plurality of reference frame numbers are designated, and (thenumber of reference frames +1) coefficients are designated as predictiveparameters for each of a luminance signal (Y) and color differencesignals (Cb and Cr). In this case, as indicated by equations (1) to (3),n assumes the number of reference frames, n+1 predictive parameters Di(i=1, . . . , n+1) are prepared for the luminance signal Y; n+1predictive parameters Ei (i=1, . . . , n+1), for the color differencesignal Cb; and n+1 predictive parameters Fi (i=1, . . . , n+1), for thecolor difference signal Cr:

$\begin{matrix}{Y_{t} = {{\sum\limits_{i = 1}^{n}{D_{i}Y_{t - i}}} + D_{n + 1}}} & (1) \\{{Cb}_{t} = {{\sum\limits_{i = 1}^{n}{E_{i}{Cb}_{t - i}}} + E_{n + 1}}} & (2) \\{{Cr}_{t} = {{\sum\limits_{i = 1}^{n}{F_{i}{Cr}_{t - i}}} + F_{n + 1}}} & (3)\end{matrix}$

This operation will be described in more detail with reference to FIG.3. Referring to FIG. 3, the last numeral of each predictive parameterrepresents an offset, and the first numeral of each predictive parameterrepresents a weighting factor (predictive coefficient). For index 0, thenumber of reference frames is given by n=2, the reference frame numberis 1, and predictive parameters are 1 and 0 for each of the luminancesignal Y and color difference signals Cr and Cb. What the predictiveparameters are 1 and 0 as in this case indicates that a local decodedvideo signal corresponding to the reference frame number “1” ismultiplied by 1 and added to offset 0. In other words, the local decodedvideo signal corresponding to the reference frame number 1 becomes areference picture signal without any change.

For index 1, two reference frames as local decoded video signalscorresponding to the reference frame numbers 1 and 2 are used. Inaccordance with predictive parameters 2, −1, and 0 for the luminancesignal Y, the local decoded video signal corresponding to the referenceframe number 1 is doubled, and the local decoded video signalcorresponding to the reference frame number 2 is subtracted from theresultant signal. Offset 0 is then added to the resultant signal. Thatis, extrapolation prediction is performed from the local decoded videosignals of two frames to generate a reference picture signal. For thecolor difference signals Cr and Cb, since predictive parameters are 1,0, and 0, the local decoded video signal corresponding to the referenceframe number 1 is used as a reference picture signal without any change.This predictive scheme corresponding to index 1 is especially effectivefor a dissolving video.

For index 2, in accordance with predictive parameters 5/4 and 16, thelocal decoded video signal corresponding to the reference frame number 1is multiplied by 5/4 and added with offset 16. For the color differencesignals Cr and Cb, since the predictive parameter is 1, the colordifference signals Cr and Cb become reference picture signals withoutany change. This predictive scheme is especially effective for a fade-invideo from a black frame.

In this manner, reference picture signals can be selected on the basisof a plurality of predictive schemes with different combinations of thenumbers of reference frames to be used and predictive parameters. Thismakes it possible for this embodiment to properly cope with a fade videoand dissolving video that have suffered deterioration in picture qualitydue to the absence of a proper predictive scheme.

(About Sequence for Selecting Prediction Scheme and Determining EncodingMode)

An example of a specific sequence for selecting a predictive scheme (acombination of a reference frame numbers and a predictive parameter) foreach macroblock and determining an encoding mode in this embodiment willbe described next with reference to FIG. 4.

First of all, a maximum assumable value is set to variable min_D (stepS101). LOOP1 (step S102) indicates a repetition for the selection of apredictive scheme in interframe encoding, and variable represents thevalue of “index” in FIG. 3. In this case, in order to obtain an optimalmotion vector for each predictive scheme, an evaluation value D of eachindex (each combination of a reference frame number and a predictiveparameter) is calculated from the number of bits associated with motionvector information 214 (the number of bits of a variable-length codeoutput from the variable-length encoder 111 in correspondence with themotion vector information 214) and a predictive error absolute valuesum, and a motion vector that minimizes the evaluation value D isselected (step S103). The evaluation value D is compared with min_D(step S104). If the evaluation value D is smaller than min_D, theevaluation value D is set to min_D, and index i is assigned to min_i(step S105).

An evaluation value D for intraframe encoding is then calculated (stepS106). The evaluation value D is compared with min_D (step S107). Ifthis comparison indicates that min_D is smaller than the evaluationvalue D, mode MODE is determined as interframe encoding, and min_i isassigned to index information INDEX (step S108). If the evaluation valueD is smaller, mode MODE is determined as intraframe encoding (stepS109). In this case, the evaluation value D is set as the estimatedvalue of the number of bits with the same quantization step size.

(About Decoding Side)

A video decoding apparatus corresponding to the video encoding apparatusshown in FIG. 1 will be described next. FIG. 5 shows the arrangement ofthe video decoding apparatus according to this embodiment. Encoded data300 sent out from the video encoding apparatus show in FIG. 1 and sentthrough a transmission system or storage system is temporarily stored inan input buffer 301 and demultiplexed by a demultiplexer 302 for eachframe on the basis of a syntax. The resultant data is input to avariable-length decoder 303. The variable-length decoder 303 decodes thevariable-length code of each syntax of the encoded data 300 to reproducea quantization orthogonal transformation coefficient, mode information413, motion vector information 414, and index information 415.

Of the reproduced information, the quantization orthogonaltransformation coefficient is dequantized by a dequantizer 304 andinversely orthogonal-transformed by an inverse orthogonal transformer305. If the mode information 413 indicates the intraframe encoding mode,a reproduction video signal is output from the inverse orthogonaltransformer 305. This signal is then output as a reproduction videosignal 310 through an adder 306.

If the mode information 413 indicates the interframe encoding mode, apredictive error signal is output from the inverse orthogonaltransformer 305, and a mode selection switch 309 is turned on. Theprediction picture signal 412 output from a frame memory/predictionpicture generator 308 is added to the predictive error signal by theadder 306. As a consequence, the reproduction video signal 310 isoutput. The reproduction video signal 310 is stored as a referencepicture signal in the frame memory/prediction picture generator 308.

The mode information 413, motion vector information 414, and indexinformation 415 are input to the frame memory/prediction picturegenerator 308. The mode information 413 is also input to the modeselection switch 309. In the intraframe encoding mode, the modeselection switch 309 is turned off. In the interframe encoding mode, theswitch is turned on.

Like the frame memory/prediction picture generator 108 on the encodingside in FIG. 1, the frame memory/prediction picture generator 308includes a plurality of prepared combinations of reference frame numbersand predictive parameters as a table, and selects one combinationindicated by the index information 415 from the table. The linear sum ofthe video signal (reproduction video signal 210) of the reference frameindicated by the reference frame number of the selected combination iscalculated in accordance with the predictive parameter of the selectedcombination, and an offset based on the predictive parameter is added tothe resultant signal. With this operation, a reference picture signal isgenerated. Subsequently, the generated reference picture signal ismotion-compensated for by using the motion vector indicated by themotion vector information 414, thereby generating a prediction picturesignal 412.

(About Frame Memory/Prediction Picture Generator 308)

FIG. 6 shows the detailed arrangement of the frame memory/predictionpicture generator 308 in FIG. 5. Referring to FIG. 6, the reproductionvideo signal 310 output from the adder 306 in FIG. 5 is stored in theframe memory set 402 under the control of a memory controller 401. Theframe memory set 402 has a plurality of (N) frame memories FM1 to FMNfor temporarily holding the reproduction video signal 310 as a referenceframe.

A predictive parameter controller 403 has in advance combinations ofreference frame numbers and predictive parameters as a table like theone shown in FIG. 3. The predictive parameter controller 403 selects acombination of the reference frame number of a reference frame and apredictive parameter, which are used to generate the prediction picturesignal 412, on the basis of the index information 415 from thevariable-length decoder 303 in FIG. 5. A plurality of multi-frame motioncompensators 404 generate a reference picture signal in accordance witha combination of a reference frame number and index information, whichis selected by the predictive parameter controller 403, and performsmotion-compensation for each block using this reference picture signalin accordance with the motion vector indicated by the motion vectorinformation 414 from the variable-length decoder 303 in FIG. 5, therebygenerating the prediction picture signal 412.

Second Embodiment

The second embodiment of the present invention will be described nextwith reference to FIGS. 7 and 8. Since the overall arrangements of avideo encoding apparatus and video decoding apparatus in this embodimentare almost the same as those in the first embodiment, only thedifferences from the first embodiment will be described.

In this embodiment, there is described an example of the manner ofexpressing predictive parameters based on a scheme of capable ofdesignating a plurality of reference frame numbers in accordance withmode information of a macroblock basis. A reference frame number isdiscriminated by the mode information for each macroblock. Thisembodiment therefore uses a table of predictive parameters as shown inFIGS. 7 and 8 instead of using a table of combinations of referenceframe numbers and predictive parameters as in the first embodiment. Thatis, index information does not indicate a reference frame number, andonly a combination of predictive parameters is designated.

The table in FIG. 7 shows an example of a combination of predictiveparameters when the number of reference frames is one. As predictiveparameters, (the number of reference frames+1) parameters, i.e., twoparameters (one weighting factor and one offset), are designated foreach of a luminance signal (Y) and color difference signals (Cb and Cr).

The table in FIG. 8 shows an example of a combination of predictiveparameters when the number of reference frames is two. In this case, aspredictive parameters, (the number of reference frames+1) parameters,i.e., three parameters (two weighting factors and one offset), aredesignated from each of a luminance signal (Y) and color differencesignals (Cb and Cr). This table is prepared for the encoding side anddecoding side each as in the first embodiment.

Third Embodiment

The third embodiment of the present invention will be described withreference to FIGS. 9 and 10. Since the overall arrangements of a videoencoding apparatus and video decoding apparatus in this embodiment arealmost the same as those in the first embodiment, only the differencesfrom the first and second embodiments will be described below.

In the first and second embodiments, a video is managed on a framebasis. In this embodiment, however, a video is managed on a picturebasis. If both a progressive signal and an interlaced signal exist asinput picture signals, pictures are not necessarily encoded on a framebasis. In consideration of this, a picture assumes (a) a picture of oneframe of a progressive signal, (b) a picture of one frame generated bymerging two fields of an interlaced signal, or (c) a picture of onefield of an interlaced signal.

If a picture to be encoded is a picture with a frame structure like (a)or (b), a reference picture used in motion compensation prediction isalso managed as a frame regardless of whether the encoded picture, whichis the reference picture, has a frame structure or field structure. Areference picture number is assigned to this picture. Likewise, if apicture to be encoded is a picture with a field structure like (c), areference picture used in motion compensation prediction is also managedas a field regardless of whether the encoded picture, which is thereference picture, has a frame structure or field structure. A referencepicture number is assigned to this picture.

Equations (4), (5), and (6) are examples of predictive equations forreference picture numbers and predictive parameters, which are preparedin the predictive parameter controller 203. These examples arepredictive equations for generating a prediction picture signal bymotion compensation prediction using one reference picture signal.

Y=clip((D ₁(i)×R _(Y)(i)+2^(L) ^(Y) ⁻¹)>>L _(Y) +D ₂(i))  (4)

Cb=clip((E ₁(i)×(R _(Cb)(i)−128)+2^(L) ^(C) ⁻¹)>>L _(C) +E₂(i)+128)  (5)

Cr=clip((F ₁(i)×(R _(Cr)(i)−128)+2^(L) ^(C) ⁻¹)>>L _(C) +F₂(i)+128)  (6)

where Y is a prediction picture signal of a luminance signal, Cb and Crare prediction picture signals of two color difference signals,R_(Y)(i), R_(Cb)(i), and R_(cr)(i) are the pixel values of the luminancesignal and two color difference signals of a reference picture signalwith index i, D₁(i) and D₂(i) are the predictive coefficient and offsetof the luminance signal with index i, E_(i)(i) and E₂(i) are thepredictive coefficient and offset of the color difference signal Cb withindex i, and F₁(i) and F₂(i) are the predictive coefficient and offsetof the color difference signal Cr with index i. Index i indicates avalue from 0 (the maximum number of reference pictures −1), and encodedfor each block to be encoded (e.g., for each macroblock). The resultantdata is then transmitted to the video decoding apparatus.

The predictive parameters D₁(i), D₂(i), E₁(i), E₂(i), F₁(i), and F₂(i)are represented by values determined in advance between the videoencoding apparatus and the video decoding apparatus or a unit ofencoding such as a frame, field, or slice, and are encoded together withencoded data to be transmitted from the video encoding apparatus to thevideo decoding apparatus. With this operation, these parameters areshared by the two apparatuses.

The equations (4), (5), and (6) are predictive equations wherein powersof 2, i.e., 2, 4, 8, 16, . . . are selected as the denominators ofpredictive coefficients by which reference picture signals aremultiplied. The predictive equations can eliminate the necessity ofdivision and be calculated by arithmetic shifts. This makes it possibleto avoid a large increase in calculation cost due to division.

In equations (4), (5), and (6), “>>” of a>>b represents an operator forarithmetically shifting an integer a to the right by b bits. Thefunction “clip” represents a clipping function for setting the value in“( )” to 0 when it is smaller than 0, and setting the value to 255 whenit is larger than 255.

In this case, assuming that L_(Y) is the shift amount of a luminancesignal, and L_(C) is the shift amount of a color difference signal. Asthese shift amounts L_(Y) and L_(C), values determined in advancebetween the video encoding apparatus and the video decoding apparatusare used. The video encoding apparatus encodes the shift amounts L_(Y)and L_(C), together with a table and encoded data, in a predeterminedunit of encoding, e.g., a frame, field, or slice, and transmits theresultant data to the video decoding apparatus. This allows the twoapparatuses to share the shift amounts L_(Y) and L_(C).

In this embodiment, tables of combinations of reference picture numbersand predictive parameters like those shown in FIGS. 9 and 10 areprepared in the predictive parameter controller 203 in FIG. 2. Referringto FIGS. 9 and 10, index i corresponds to prediction pictures that canbe selected for each block. In this case, four types of predictionpictures are present in correspondence with 0 to 3 of index i.“Reference picture number” is, in other words, the number of a localdecoded video signal used as a reference picture.

“Flag” is a flag indicating whether or not a predictive equation using apredictive parameter is applied to a reference picture number indicatedby index i. If Flag is “0”, motion compensation prediction is performedby using the local decoded video signal corresponding to the referencepicture number indicated by index i without using any predictiveparameter. If Flag is “1”, a prediction picture is generated accordingto equations (4), (5), and (6) by using a local decoded video andpredictive parameter corresponding to the reference picture numberindicated by index i, thus performing motion compensation prediction.This information of Flag is also encoded, together with a table andencoded data, by using a value determined in advance between the videoencoding apparatus and the video decoding apparatus or in apredetermined unit of encoding, e.g., a frame, field, or slice, in thevideo encoding apparatus. The resultant data is transmitted to the videodecoding apparatus. This allows the two apparatuses to share theinformation of Flag.

In these cases, a prediction picture is generated by using a predictiveparameter when index i=0 with respect to a reference picture number 105,and motion compensation prediction is performed without using anypredictive parameter when i=1. As described above, a plurality ofpredictive schemes may exist for the same reference picture number.

The table shown in FIG. 9 has predictive parameters D₁(i), D₂(i), E₁(i),E₂(i), F₁(i), and F₂(i) assigned to a luminance signal and two colordifference signals in correspondence with equations (4), (5), and (6).FIG. 10 shows an example of a table in which predictive parameters areassigned to only luminance signals. In general, the number of bits of acolor difference signal is not very large compared with the number ofbits of a luminance signal. For this reason, in order to reduce theamount of calculation required to generate a prediction picture and thenumber of bits transmitted in a table, a table is prepared, in whichpredictive parameters for color difference signals are omitted as shownin FIG. 10 and predictive parameters are assigned to only luminancesignals. In this case, only equation (4) is used as a predictiveequation.

Equations (7) to (12) are predictive equations in a case wherein aplurality of (two in this case) reference pictures are used.

P _(Y)(i)=(D ₁(i)×R _(Y)(i)+2^(L) ^(Y) ⁻¹)>>L _(Y) +D ₂(i)  (7)

P _(Cb)(i)=(E ₁(i)×(R _(Cb)(i)−128)+2^(L) ^(C) ⁻¹)>>L _(C) +E₂(i)+128  (8)

P _(Cr)(i)=(F ₁(i)×(R _(Cr)(i)−128)+2^(L) ^(C) ⁻¹)>>L _(C) +F₂(i)+128  (9)

Y=clip((P _(Y)(i)+P _(Y)(j)+1)>>1)  (10)

Cb=clip((P _(Cb)(i)+P _(Cb)(j)+1)>>1)  (11)

Cr=clip((P _(Cr)(i)+P _(Cr)(j)+1)>>1)  (12)

The pieces of information of the predictive parameters D₁(i), D₂(i),E₁(i), E₂(i), F₁(i), F₂(i), L_(Y), and L_(C) and Flag are valuesdetermined in advance between the video encoding apparatus and the videodecoding apparatus or encoded, together with encoded data, in a unit ofencoding such as a frame, field, or slice, and are transmitted from thevideo encoding apparatus to the video decoding apparatus. This allowsthe two apparatuses to share these pieces of information.

If a picture to be decoded is a picture having a frame structure, areference picture used for motion compensation prediction is alsomanaged as a frame regardless of whether a decoded picture as areference picture has a frame structure or field structure. A referencepicture number is assigned to this picture. Likewise, if a picture to beprogrammed is a picture having a field structure, a reference pictureused for motion compensation prediction is also managed as a fieldregardless of whether a decoded picture as a reference picture has aframe structure or field structure. A reference picture number isassigned to this picture.

(About Syntax of Index Information)

FIG. 11 shows an example of a syntax in a case wherein index informationis encoded in each block. First of all, mode information MODE is presentfor each block. It is determined in accordance with the mode informationMODE whether or not index information IDi indicating the value of indexi and index information IDj indicating the value of index j are encoded.Encoded information of motion vector information MVi for the motioncompensation prediction of index i and motion vector information MVj forthe motion predictive compensation of index j is added as motion vectorinformation for each block after encoded index information.

(About Data Structure of Encoded Bit Stream)

FIG. 12 shows a specific example of an encoded bit stream for each blockwhen a prediction picture is generated by using one reference picture.The index information IDi is set after mode information MODE, and themotion vector information MVi is set thereafter. The motion vectorinformation MVi is generally two-dimensional vector information.Depending on a motion compensation method in a block which is indicatedby mode information, a plurality of two-dimensional vectors may furtherbe sent.

FIG. 13 shows a specific example of an encoded bit stream for each blockwhen a prediction picture is generated by using two reference pictures.Index information IDi and index information IDj are set after modeinformation MODE, and motion vector information MVi and motion vectorinformation MVj are set thereafter. The motion vector information MViand motion vector information j are generally two-dimensional vectorinformation. Depending on a motion compensation method in a blockindicated by mode information, a plurality of two-dimensional vectorsmay be further sent.

Note that the above structures of a syntax and bit stream can be equallyapplied to all the embodiments.

Fourth Embodiment

The fourth embodiment of the present invention will be described nextwith reference to FIGS. 14 and 15. Since the overall arrangements of avideo encoding apparatus and video decoding apparatus in this embodimentare almost the same as those in the first embodiment, only differencesfrom the first, second, and third embodiments will be described. In thethird embodiment, encoding on a frame basis and encoding on a fieldbasis are switched for each picture. In the fourth embodiment, encodingon a frame basis and encoding on a field basis are switched for eachmacroblock.

When encoding on a frame basis and encoding on a field basis areswitched for each macroblock, the same reference picture numberindicates different pictures, even within the same picture, depending onwhether a macroblock is encoded on the frame basis or on the fieldbasis. For this reason, with the tables shown in FIGS. 9 and 10 used inthe third embodiment, a proper prediction picture signal may not begenerated.

In order to solve this problem, in this embodiment, tables ofcombinations of reference picture numbers and predictive parameters likethose shown in FIGS. 14 and 15 are prepared in a predictive parametercontroller 203 in FIG. 2. Assume that when a macroblock is to be encodedon the field basis, the same predictive parameter as that correspondingto a reference picture number (reference frame index number) used whenthe macroblock is encoded on the frame basis is used.

FIG. 14 shows a table used when the macroblock is encoded on a fieldbasis and a picture to be encoded is a top field. The upper and lowerrows of each field index column correspond to the top field and bottomfield, respectively. As shown in FIG. 14, frame index j and field indexk are related such that when k=2j in the top field, k=2j+1 in the bottomfield. Reference frame number m and reference field number n are relatedsuch that when n=2m in the top field, n=2m+1 in the bottom field.

FIG. 15 shows a table used when the macroblock is encoded on a fieldbasis, and a picture to be encoded is a bottom field. As in the tableshown in FIG. 14, the upper and lower rows of each field index columncorrespond to a top field and the bottom field, respectively. In thetable in FIG. 15, frame index j and field index k are related such thatwhen k=2+1 in the top field, k=2j in the bottom field. This makes itpossible to assign a small value as field index k to an in-phase bottomfield. The relationship between reference frame number m and referencefield number n is the same as that in the table in FIG. 14.

When the macroblock is to be encoded on a field basis, a frame index andfield index are encoded as index information by using the tables shownin FIGS. 14 and 15. When the macroblock is to be encoded on a framebasis, only the frame index common to the tables in FIGS. 14 and 15 isindex-encoded as index information.

In this embodiment, predictive parameters are assigned to a frame andfield by using one table. However, a table for frames and a table forfields may be separately prepared for one picture or slice.

Each embodiment described above has exemplified the videoencoding/decoding scheme using orthogonal transformation on a blockbasis. Even if, however, another transformation technique such aswavelet transformation is used, the technique of the present inventionwhich has been described in the above embodiments can be used.

Video encoding and decoding processing according to the presentinvention may be implemented as hardware (apparatus) or software using acomputer. Some processing may be implemented by hardware, and the otherprocessing may be performed by software. According to the presentinvention, there can be provided a program for causing a computer toexecute the above video encoding or video decoding or a storage mediumstoring the program.

1. A video encoding method for subjecting an input video image havingluminance and two color differences to prediction encoding, comprising:determining whether a unit-of-encoding of a to-be-encoded block is aframe or a field; obtaining, for the to-be-encoded block, a given numberof indexes each indicating (A) one combination of a plurality ofcombinations comprising (a) a weighting factor for each luminance andfor each of two color differences and (b) an offset for each luminanceand for each of two color differences and (B) a reference image;generating, for each luminance and for each of two color differences, aprediction image by multiplying the given number of reference images bythe weighting factors corresponding to the reference images and addingthe given number of offsets; generating a prediction error signal bycalculating an error between the input video image and the predictionimage; generating a quantized orthogonal transform coefficient bysubjecting the prediction error signal to orthogonal transform andquantization; and encoding, for the to-be-encoded block, (1) thequantized orthogonal transform coefficient and (2) the index andencoding, for one or more to-be-encoded blocks, the plurality ofcombinations; wherein when the unit-of-encoding is the frame, respectivepossible values of the index indicate different combinations of thecombinations, respectively, and when the unit-of-encoding is the field,two possible values corresponding to the different reference imagesindicate the same combination.
 2. A video encoding apparatus forsubjecting an input video image having luminance and two colordifferences to prediction encoding, comprising: a determining moduleconfigured to determine whether a unit-of-encoding of a to-be-encodedblock is a frame or a field; an obtaining module configured to obtain,for the to-be-encoded block, a given number of indexes each indicating(A) one combination of a plurality of combinations comprising (a) aweighting factor for each luminance and for each of two colordifferences and (b) an offset for each luminance and for each of twocolor differences and (B) a reference image; a first generatorconfigured to generate, for each luminance and for each of two colordifferences, a prediction image by multiplying the given number ofreference images by the weighting factors corresponding to the referenceimages and adding the given number of offsets; a second generatorconfigured to generate a prediction error signal by calculating an errorbetween the input video image and the prediction image; a thirdgenerator configured to generate a quantized orthogonal transformcoefficient by subjecting the prediction error signal to orthogonaltransform and quantization; and an encoder configured to encode, for theto-be-encoded block, (1) the quantized orthogonal transform coefficientand (2) the index and encode, for one or more to-be-encoded blocks, theplurality of combinations; wherein when the unit-of-encoding is theframe, respective possible values of the index indicate differentcombinations of the combinations, respectively, and when theunit-of-encoding is the field, two possible values corresponding to thedifferent reference images indicate the same combination.